Wordification: Propositionalization by unfolding relational data into bags of words
نویسندگان
چکیده
Inductive Logic Programming (ILP) and Relational Data Mining (RDM) address the task of inducing models or patterns from multi-relational data. One of the established approaches to RDM is propositionalization, characterized by transforming a relational database into a single-table representation. This paper presents a propositionalization technique called wordification which can be seen as a transformation of a relational database into a corpus of text documents. Wordification constructs simple, easy to understand features, acting as words in the transformed Bag-Of-Words representation. This paper presents the wordification methodology, together with an experimental comparison of several propositionalization approaches on seven relational datasets. The main advantages of the approach are: simple implementation, accuracy comparable to competitive methods, and greater scalability, as it performs several times faster on all experimental databases. Furthermore, the wordification methodology and the evaluation procedure are implemented as executable workflows in the web-based data mining platform ClowdFlows. The implemented workflows include also several other ILP and RDM algorithms, as well as the utility components that were added to the platform to enable access to these techniques to a wider research audience. 2015 Elsevier Ltd. All rights reserved.
منابع مشابه
A Wordification Approach to Relational Data Mining: Early Results
This paper describes a propositionalization technique called wordification. Wordification is inspired by text mining and can be seen as a transformation of a relational database into a corpus of documents. As in previous propositionalization methods, after the wordification step any propositional data mining algorithm can be applied. The most notable advantage of the presented technique is grea...
متن کاملEnsemble Relational Learning based on Selective Propositionalization
Dealing with structured data needs the use of expressive representation formalisms that, however, puts the problem to deal with the computational complexity of the machine learning process. Furthermore, real world domains require tools able to manage their typical uncertainty. Many statistical relational learning approaches try to deal with these problems by combining the construction of releva...
متن کاملFlexible propositionalization of continuous attributes in relational data mining
In a relational database, data are stored in primary and secondary tables. Propositionalization can transform a relational database into a single attribute-value table, and hence becomes a useful technique for mining relational databases. However, most of the existing propositionalization approaches deal with categorical attributes, and cannot handle a threshold on an attribute and a threshold ...
متن کاملOn propositionalization for knowledge discovery in relational databases
Propositionalization is a process that leads from relational data and background knowledge to a single-table representation thereof, which serves as the input to widespread systems for knowledge discovery in databases. Systems for propositionalization thus support the analyst during the usually costly phase of data preparation for data mining. Such systems have been applied for more than 15 yea...
متن کاملEfficiency-conscious propositionalization for relational learning
Systems aiming at discovering interesting knowledge in data, now commonly called data mining systems, are typically employed in nding patterns in a single relational table. Most of mainstream data mining tools are not applicable in the more challenging task of nding knowledge in structured data represented by a multi-relational database. Although a family of methods known as inductive logic pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Expert Syst. Appl.
دوره 42 شماره
صفحات -
تاریخ انتشار 2015